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ABSTRACT 

The  inherent  discriminative  capability  of  sparse  represen¬ 
tations  has  been  exploited  recently  for  hyperspectral  target 
detection.  This  approach  relies  on  the  observation  that  the 
spectral  signature  of  a  pixel  can  be  represented  as  a  lin¬ 
ear  combination  of  a  few  training  spectra  drawn  from  both 
target  and  background  classes.  The  sparse  representation 
corresponding  to  a  given  test  spectrum  captures  class-specific 
discriminative  information  crucial  for  detection  tasks.  Spatio- 
spectral  information  has  also  been  introduced  into  this  frame¬ 
work  via  a  joint  sparsity  model  that  simultaneously  solves 
for  the  sparse  features  for  a  group  of  spatially  local  pixels, 
since  such  pixels  are  highly  likely  to  have  similar  spectral 
characteristics.  In  this  paper,  we  propose  a  probabilistic 
graphical  model  framework  that  can  explicitly  learn  the  class 
conditional  correlations  between  these  distinct  sparse  rep¬ 
resentations  corresponding  to  different  pixels  in  a  spatial 
neighborhood.  Simulation  results  show  that  the  proposed  al¬ 
gorithm  outperforms  classical  hyperspectral  target  detection 
algorithms  as  well  as  support  vector  machines. 

Index  Terms —  Hyperspectral  target  detection,  sparsity, 
probabilistic  graphical  models. 

1.  INTRODUCTION 

An  important  research  problem  in  hyperspectral  imaging 
(HSI)  [1]  is  hyperspectral  target  detection,  which  can  be 
viewed  as  a  binary  classification  problem  where  hyperspec¬ 
tral  pixels  are  labeled  as  either  target  or  background  based 
on  their  spectral  characteristics.  Many  statistical  hypothesis 
testing  techniques  [2]  have  been  proposed  for  hyperspectral 
target  detection,  including  the  spectral  matched  filter  (SMF), 
matched  subspace  detector  (MSD)  and  adaptive  subspace 
detector  (ASD).  Advances  in  machine  learning  theory  have 
contributed  to  the  popularity  of  support  vector  machines 
(SVM)  [3]  as  a  powerful  tool  to  classify  hyperspectral  data. 

A  significant  recent  advance  has  exploited  the  inherent 
discriminative  nature  of  sparse  representations  for  hyperspec- 

This  work  has  been  partially  supported  by  NSF  under  Grants  CCF- 
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tral  target  detection  [4].  The  sparsity  model  is  posited  on  the 
observation  that  spectral  signatures  of  pixels  from  the  same 
class  (target  or  background)  lie  in  a  low-dimensional  sub¬ 
space.  Consequently,  the  spectral  signature  of  a  test  pixel  can 
be  represented  by  the  linear  combination  of  a  few  training 
spectra  which  come  from  an  over-complete  dictionary  built 
using  the  target  and  background  subspaces.  The  associated 
sparse  representation,  which  is  obtained  as  the  solution  to  a 
sparsity-constrained  optimization  problem,  has  been  shown  to 
capture  class-specific  discriminative  information  crucial  for 
detection  and  classification  tasks.  Typically  in  hyperspec¬ 
tral  images,  pixels  in  a  small  spatial  neighborhood  belong  to 
the  same  class  and  their  spectra  are  highly  correlated,  a  fact 
not  exploited  by  the  pixel-wise  sparse  representation  model. 
To  address  this  issue,  a  joint  sparsity  model  has  been  pro¬ 
posed  recently  [5]  to  simultaneously  capture  spatial  and  spec¬ 
tral  characteristics.  The  spectral  signatures  of  pixels  in  a  local 
spatial  neighborhood  (of  the  pixel  of  interest)  are  constrained 
to  be  represented  by  a  common  collection  of  training  spectra, 
albeit  with  different  weights.  A  simultaneous  sparse  recovery 
problem  is  now  solved  to  recover  both  the  training  support 
and  the  corresponding  sparse  representation  vectors. 

Motivation:  The  resulting  sparse  representations  are  dis¬ 
criminative  in  nature.  Hence,  the  detection  statistic  in  [4,5]  in¬ 
volves  a  simple  comparison  of  reconstruction  residuals  using 
the  training  and  background  subspaces  separately.  The  sparse 
representations  corresponding  to  different  pixels  in  a  local 
neighborhood  are  statistically  correlated,  and  this  correlation 
is  captured  intuitively  by  the  joint  sparsity  model.  A  challeng¬ 
ing  open  problem,  therefore,  is  to  mine  the  class-conditional 
correlations  among  these  distinct  feature  representations  in  a 
more  principled  manner  for  detection  and  classification.  In 
this  paper,  we  propose  a  probabilistic  graphical  model  frame¬ 
work  to  explicitly  learn  the  conditional  dependencies  between 
the  sparse  features  via  discriminative  graphs. 

Overview  of  contribution:  The  sparse  representation 
vectors  of  pixels  at  different  locations  in  a  local  spatial 
neighborhood  (relative  to  a  central  pixel)  comprise  several 
distinct  sets  of  features  which  provide  complementary  yet 
correlated  information  useful  for  detection.  To  learn  these 
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Fig.  1.  Hyperspectral  image  detection  using  discriminative  graphical  models  on  sparse  feature  representations  obtained  from 
local  pixel  neighborhoods.  The  rightmost  figure  shows  the  final  learnt  graphs,  where  solid  lines  represent  the  initial  disjoint 
graphs  and  the  dashed  lines  represent  newly  learnt  edges  which  capture  conditional  correlations. 


statistical  correlations,  we  first  learn  a  pair  of  discrimina¬ 
tive  tree  graphs  (one  for  each  class)  for  each  distinct  set  of 
features  [6],  and  then  augment  new  edges  to  these  initially 
disjoint  graphs  iteratively  via  boosting  [7].  Consequently,  we 
learn  a  discriminative  classifier  over  the  sparse  features  unlike 
the  reconstruction  residual-based  detection  scheme  in  [5]. 


2.  BACKGROUND 


2.1.  Sparsity  Models  for  Hyperspectral  Target  Detection 


Lety  £  be  a  pixel  with  B  indicating  the  number  of  spectral 
bands,  Db  £  M.BxNb  be  the  sub-dictionary  whose  columns  are 
the  Nf,  background  training  samples,  and  D,  £  RBxNt  be  the 
sub-dictionary  whose  columns  are  the  N,  background  training 
samples.  The  HSI  pixel  y  can  then  be  written  as: 


y  =  Dhah  +  D,a, 


a 


=  Da. 


(1) 


where  D  £  R.BxN  with  N  =  Nb  +  Nt  is  a  dictionary  consist¬ 
ing  of  training  samples  from  both  training  and  background 
classes,  and  a  £  R,v  is  a  sparse  vector.  Given  the  overcom¬ 
plete  dictionary  D,  the  sparse  coefficient  vector  a  is  obtained 
by  solving  the  following  optimization  problem: 


a  =  argmin||a||0  subject  to  ||y—  Da||2  <  £,  (2) 

where  e  is  a  suitably  chosen  reconstruction  error  tolerance. 
The  class  label  of  y  is  determined  by  comparing  the  recon¬ 
struction  residuals: 


Riy)  =  \\y-Dbab\\2-\\y-D,at\\2,  (3) 

where  df,  and  a,  are,  respectively,  the  set  of  coefficients  in  a 
corresponding  to  Db  and  D,.  The  test  vector  y  is  identified 


as  a  target  pixel  if  R(y)  is  larger  than  some  suitably  chosen 
positive  threshold  8;  if  not,  it  is  labeled  as  a  background  pixel. 

This  pixel-wise  sparsity  model  is  extended  to  incorporate 
local  spatial  information  in  [5]  by  enforcing  a  common  sup¬ 
port  set  of  training  spectra  for  a  collection  of  neighboring  pix¬ 
els  yi,i=  1  as  follows: 

Y=[yi  y2  ■■■  yT]  =  [£>a,  Da  2  ■  ■  ■  DaT] 

=  D[ai  a2  •••  og]  =DS.  (4) 

s 

Since  all  the  pixels  in  Y  are  represented  by  the  same  collection 
of  training  spectra  in  D,  the  vectors  a,,  i  =  1 .....  7’,  all  have 
non-zero  entries  at  the  same  locations.  As  a  result,  S  is  a 
sparse  matrix  with  only  a  few  nonzero  rows,  and  is  recovered 
by  solving  the  following  constrained  optimization  problem: 

5  =  argmin||y-D5||f  subjeetto  ||S||row0  <  K0,  (5) 

where  |S||row  0  denotes  the  number  of  non-zero  rows  of  S  and 
||-||F  is  the  Frobenius  norm.  The  problem  in  (5)  can  be  ap¬ 
proximately  solved  by  the  greedy  Simultaneous  Orthogonal 
Matching  Pursuit  (SOMP)  algorithm  [8]. 

2.2.  Probabilistic  Graphical  Models 

Probabilistic  graphical  models  provide  a  convenient  way  of 
visualizing  the  correlations  between  the  individual  random 
variables  in  a  multivariate  probability  distribution.  The  ran¬ 
dom  variables  are  represented  by  the  nodes  ‘V  =  {v  i , . . .  ,v,  } 
in  a  graph  (j,  and  the  (undirected)  edges  £  C  (%)  which 
connect  pairs  of  nodes  identify  conditional  dependencies.  A 
graphical  model  hence  approximates  the  joint  probability  dis¬ 
tribution  function  by  a  product  of  terms  that  represent  pair¬ 
wise  and  marginal  statistics.  Many  recent  applications  [9] 
have  demonstrated  the  ability  of  graphical  models  to  learn 


models  for  high-dimensional  data  using  limited  training  (a 
typical  scenario  for  practical  HSI  applications)  under  mod¬ 
erate  computational  complexity. 

As  the  starting  point  for  our  contribution,  we  consider  a 
recent  discriminative  learning  framework  [6]  wherein  a  pair 
of  graphs  is  jointly  learnt  by  minimizing  the  classification  er¬ 
ror.  Specifically,  the  tree-approximate  /-divergence  (a  sym¬ 
metric  extension  of  the  Kullback-Leibler  (KL)  distance)  be¬ 
tween  two  distributions  p  and  q  is  maximized: 


J(p,q\p,q)=  /  (p(x)-q(x)) log 


P  0) 


q{x) 


dx.  (6) 


Based  on  the  observation  that  maximizing  the  /-divergence 
minimizes  the  upper  bound  on  the  probability  of  classification 
error,  the  discriminative  learning  problem  then  becomes: 


{PA)  =  arg.max  J(p,q\P,q),  (7) 

p,q  are  trees 


where  p  and  q  are  the  available  empirical  estimates.  The  prob¬ 
lem  in  (7)  is  shown  to  decouple  into  two  maximum-weight 
spanning  tree  (MWST)  problems  [6]: 

P  =  arg  min  D(p\\p)  -D(q\\p) 

p  is  a  tree 

(8) 

4  =  arg.min  D(q\\q)  -D(p\\q), 

q  is  a  tree 

where  D(p\\p)  =  Ep^og{p/p)\  represents  the  KL-distance. 
From  (8),  we  see  that  the  optimal  choice  of  p  (q)  minimizes 
its  distance  to  p  (q)  while  simultaneously  maximizing  its  dis¬ 
tance  from  q  (p).  The  trade-off  between  generalization  and 
performance  inherent  to  graphical  models  is  resolved  by  iter¬ 
atively  thickening  the  initial  graph  with  more  edges  via  boost¬ 
ing  [7]  to  learn  a  richer  structure. 

As  discussed  earlier,  the  sparse  representations  from  dif¬ 
ferent  pixels  in  a  local  spatial  neighborhood  are  correlated  and 
our  contribution  is  an  attempt  to  explicitly  learn  these  condi¬ 
tional  dependencies.  To  this  end,  we  instantiate  our  recent 
discriminative  graphical  framework  [10]  for  HSI  detection. 


3.  DISCRIMINATIVE  GRAPHICAL  MODELS  FOR 
HYPERSPECTRAL  TARGET  DETECTION 

In  this  section,  we  introduce  our  proposed  Local-Sparsity- 
Graphical-Model  (LSGM)  approach  for  joint  sparsity  and 
graphical  model-based  HSI  detection.  An  illustration  of  the 
overall  framework  is  shown  in  Fig.  1.  Algorithm  1  outlines 
the  steps  in  the  process,  which  consists  of  an  offline  training 
stage  (Steps  1-4)  followed  by  an  online  test  stage  (Steps  5-6). 
The  discriminative  graphs  are  learnt  in  the  training  stage. 
First,  feature  vectors  (i.e.,  sparse  vectors  with  respect  to  a 
given  D )  of  training  samples  and  their  neighboring  pixels  are 
obtained  by  solving  the  joint  sparse  recovery  problem  in  (5). 

Let  T  be  the  size  of  the  neighborhood.  For  every  pixel 
y  £  T  different  features  (X/  £  RN,l  =  1,2,. . .  ,T  are  ob¬ 
tained,  as  illustrated  in  Fig.  1  for  a  3  x  3  neighborhood  with 


Algorithm  1  LSGM  (Steps  1-4  offline) 

1:  Feature  extraction  (training):  Compute  sparse  representations 
a / ,  /  =  1 , . . . ,  T  for  neighboring  pixels  of  the  training  data 

2:  Initial  disjoint  graphs: 

Discriminatively  learn  T  pairs  of  A-node  tree  graphs  and  Qb 
on  {ot/}.  for  /  =  1 , . . . ,  T,  obtained  from  training  data 
3:  Separately  concatenate  nodes  corresponding  to  the  two  classes, 
to  generate  initial  graphs 

4:  Boosting  on  disjoint  graphs:  Iteratively  thicken  initial  disjoint 
graphs  via  boosting  to  obtain  final  graphs  Q'  and  Qb 

{Online  process} 

5:  Feature  extraction  (test):  Obtain  sparse  representations  (X/,/  = 
1 , . . . ,  T  in  lA  from  test  image 

6:  Inference:  Classify  based  on  output  of  the  resulting  classifier 
using  (9). 


T  =  9.  Training  features  for  class  C,  correspond  to  pixels 
in  a  neighborhood  of  training  target  samples,  while  features 
for  Cb  are  the  sparse  vectors  associated  with  neighbors  of 
background  training  samples.  For  each  of  the  T  sets  of  fea¬ 
tures,  a  pair  of  A'-nodc  discriminative  tree  graphs  Cjj  and 
ljj\  which  respectively  approximate  the  class  distributions 
/(a,  |Cf)  and  /(<X/|C),),  are  simultaneously  learnt.  The  initial 
disjoint  graphs  with  TN  nodes  representing  the  class  dis¬ 
tribution  corresponding  to  C,  and  C/,  are  then  generated  by 
separately  concatenating  the  nodes  of  Q!,l  =  and 

,  l  =  1 , . . .  ,7,  respectively.  These  graphs  with  sparse  edge 
structure  are  then  iteratively  thickened  via  boosting  [10].  Dif¬ 
ferent  pairs  of  discriminative  graphs  over  the  same  sets  of 
nodes  with  different  weights  are  learnt  in  different  iterations, 
and  the  newly-learnt  edges  are  used  to  augment  the  graphs. 
The  final  “thickened”  graphs  Cj  and  (jh  are  shown  in  Fig.  1. 

The  above  process  is  performed  offline.  The  classifica¬ 
tion  of  a  new  test  sample  is  then  performed  online.  Features 
a  are  extracted  from  the  test  sample  y  by  solving  the  sparse 
recovery  problem  in  (5)  for  the  T  pixels  in  the  neighborhood 
centered  aty.  Let  f(a\Ct)  and  /(a|Q>)  denote  the  probabil¬ 
ity  distribution  functions  for  the  final  graphs  Cj  and  Cjh  learnt 
for  Ct  and  Q,  respectively.  The  class  label  of  y  is  finally  de¬ 
termined  as  follows: 


Class  (y) 


Targe,  if  tog  i{g§)  >  0 

Background  if  log  ^  J  <  0- 


(9) 


4.  EXPERIMENTAL  RESULTS  AND  DISCUSSION 

Hyperspectral  images  from  the  HYDICE  forest  radiance 
I  data  collection  (FR-I)  [11]  are  used  for  the  experiment. 
The  HYDICE  sensor  generates  210  bands  across  the  whole 
spectral  range  from  0.4  to  2.5  pm,  spanning  the  visible  and 
short-wave  infrared  bands  and  including  14  targets.  Only 
150  of  the  210  available  bands  are  retained  by  removing  the 
absorption  and  low-SNR  bands.  The  target  sub-dictionary  Dt 
comprises  18  training  spectra  chosen  from  the  leftmost  target 


Table  1.  Confusion  matrix  for  the  FR-I  hyperspectral  image. 
Four  different  methods  are  compared.  (Nt  =  18  and  Nf,  =  216.) 


Class 

Target 

Background 

Method 

Target 

0.6512 

0.3488 

MSD 

0.9493 

0.0507 

SVM-CK 

0.9556 

0.0444 

SOMP 

0.9612 

0.0388 

LSGM 

Background 

0.0239 

0.9761 

MSD 

0.0090 

0.9910 

SVM-CK 

0.0097 

0.9903 

SOMP 

0.0086 

0.9914 

LSGM 

in  the  scene,  while  the  background  sub-dictionary  Di,  has 
216  training  spectra  chosen  using  the  dual  window  technique 
described  in  [4], 

Four  different  methods  are  compared:  (i)  classical  matched 
subspace  detector  (MSD)  which  operates  on  each  pixel  inde¬ 
pendently  [12],  (ii)  composite  kernel  support  vector  machines 
(SVM-CK)  which  considers  a  weighted  sum  of  spectral  and 
spatial  information  [3],  (iii)  simultaneous  orthogonal  match¬ 
ing  pursuit  (SOMP)  which  involves  solving  Eq.  (5)  with  a 
3x3  local  window  [5],  and  (iv)  the  proposed  LSGM  ap¬ 
proach  with  the  same  3x3  window  to  generate  the  sparse 
features.  Table  1  shows  the  confusion  matrix  in  which  detec¬ 
tion  and  error  rates  are  provided  with  each  row  representing 
the  true  class  of  the  test  pixels  and  each  column  representing 
the  output  of  the  specified  classifier.  All  four  approaches  are 
compared,  and  the  proposed  LSGM  methods  offers  better 
target  detection  performance.  Improvements  over  SOMP  can 
be  attributed  to  the  use  of  an  explicit  discriminative  classifier 
in  LSGM.  All  approaches  identify  the  background  class  with 
a  reasonably  high  degree  of  accuracy. 

Fig.  2  shows  the  receiver  operating  characteristics  (ROC) 
curve  for  the  detection  problem.  The  ROC  curve  describes  the 
probability  of  detection  (PD)  as  a  function  of  the  probability 
of  false  alarms  (PFA).  To  calculate  the  ROC  curve,  a  large 
number  of  thresholds  are  chosen  between  the  minimum  and 
maximum  of  the  detector  output,  and  class  labels  for  all  test 
pixels  are  determined  at  each  threshold.  The  PFA  is  calculated 
as  the  ratio  of  the  number  of  false  alarms  (background  pixels 
determined  as  target)  to  the  total  number  of  pixels  in  the  test 
region,  while  the  PD  is  the  ratio  of  the  number  of  hits  (target 
pixels  correctly  determined  as  target)  to  the  total  number  of 
true  target  pixels.  It  can  be  seen  that  the  proposed  LSGM 
approach  offers  the  best  overall  detection  performance. 
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